microsoft research
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
AI Red-Teaming is a Sociotechnical System. Now What?
Gillespie, Tarleton, Shaw, Ryland, Gray, Mary L., Suh, Jina
Whether tapped directly on the web, or embedded in software suites, search engines, and social media platforms, LLMs are everywhere. When a technology jumps this quickly from theoretical plaything to consumer service, many other elements are also settling in around it, without much forethought: interfaces, policies, business models, labor arrangements, infrastructural assurances, complementary technologies, public claims, advertising campaigns, regulations. Researchers studying the workings and implications of these technologies, across computer science, engineering, the social sciences, humanities, and law, must gear up just as fast to study not just the core technology, but the sociotechnical system taking shape around it[19]. Many of these decisions, arrangements, and infrastructures may turn out to be as consequential for users and the broader public as the core technology itself. But the boisterous promises and debates that surround a new technology can obscure these other essential elements that make technologies always more than the sum of their engineered parts. In this essay, we hope to call upon computer scientists and social scientists alike to pay closer, critical attention to thephenomenonof"red-teaming."
- North America > United States > California (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Washington > King County > Redmond (0.04)
- (7 more...)
- Law (1.00)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
- Information Technology > Security & Privacy (0.93)
- (2 more...)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.30)
GEMS: Generative Expert Metric System through Iterative Prompt Priming
Cheng, Ti-Chung, Badea, Carmen, Bird, Christian, Zimmermann, Thomas, DeLine, Robert, Forsgren, Nicole, Ford, Denae
Across domains, metrics and measurements are fundamental to identifying challenges, informing decisions, and resolving conflicts. Despite the abundance of data available in this information age, not only can it be challenging for a single expert to work across multi-disciplinary data, but non-experts can also find it unintuitive to create effective measures or transform theories into context-specific metrics that are chosen appropriately. This technical report addresses this challenge by examining software communities within large software corporations, where different measures are used as proxies to locate counterparts within the organization to transfer tacit knowledge. We propose a prompt-engineering framework inspired by neural activities, demonstrating that generative models can extract and summarize theories and perform basic reasoning, thereby transforming concepts into context-aware metrics to support software communities given software repository data. While this research zoomed in on software communities, we believe the framework's applicability extends across various fields, showcasing expert-theory-inspired metrics that aid in triaging complex challenges.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (6 more...)
AutoVerus: Automated Proof Generation for Rust Code
Yang, Chenyuan, Li, Xuheng, Misu, Md Rakib Hossain, Yao, Jianan, Cui, Weidong, Gong, Yeyun, Hawblitzel, Chris, Lahiri, Shuvendu, Lorch, Jacob R., Lu, Shuai, Yang, Fan, Zhou, Ziqiao, Lu, Shan
Generative AI has shown its values for many software engineering tasks. Still in its infancy, large language model (LLM)-based proof generation lags behind LLM-based code generation. In this paper, we present AutoVerus. AutoVerus uses LLM to automatically generate correctness proof for Rust code. AutoVerus is designed to match the unique features of Verus, a verification tool that can prove the correctness of Rust code using proofs and specifications also written in Rust. AutoVerus consists of a network of LLM agents that are crafted and orchestrated to mimic human experts' three phases of proof construction: preliminary proof generation, proof refinement guided by generic tips, and proof debugging guided by verification errors. To thoroughly evaluate AutoVerus and help foster future research in this direction, we have built a benchmark suite of 150 non-trivial proof tasks, based on existing code-generation benchmarks and verification benchmarks. Our evaluation shows that AutoVerus can automatically generate correct proof for more than 90% of them, with more than half of them tackled in less than 30 seconds or 3 LLM calls.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China (0.04)
- (17 more...)
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing
Zhang, Alice Qian, Shaw, Ryland, Anthis, Jacy Reese, Milton, Ashlee, Tseng, Emily, Suh, Jina, Ahmad, Lama, Kumar, Ram Shankar Siva, Posada, Julian, Shestakofsky, Benjamin, Roberts, Sarah T., Gray, Mary L.
Rapid progress in general-purpose AI has sparked significant interest in "red teaming," a practice of adversarial testing originating in military and cybersecurity applications. AI red teaming raises many questions about the human factor, such as how red teamers are selected, biases and blindspots in how tests are conducted, and harmful content's psychological effects on red teamers. A growing body of HCI and CSCW literature examines related practices-including data labeling, content moderation, and algorithmic auditing. However, few, if any, have investigated red teaming itself. This workshop seeks to consider the conceptual and empirical challenges associated with this practice, often rendered opaque by non-disclosure agreements. Future studies may explore topics ranging from fairness to mental health and other areas of potential harm. We aim to facilitate a community of researchers and practitioners who can begin to meet these challenges with creativity, innovation, and thoughtful reflection.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- (14 more...)
- Government (1.00)
- Information Technology > Security & Privacy (0.89)
- Health & Medicine > Consumer Health (0.71)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)
Biomedical knowledge graph-enhanced prompt generation for large language models
Soman, Karthik, Rose, Peter W, Morris, John H, Akbas, Rabia E, Smith, Brett, Peetoom, Braian, Villouta-Reyes, Catalina, Cerono, Gabriel, Shi, Yongmei, Rizk-Jackson, Angela, Israni, Sharat, Nelson, Charlotte A, Huang, Sui, Baranzini, Sergio E
Large Language Models (LLMs) have been driving progress in AI at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, and the latter require domain-expertise. External knowledge infusion is task-specific and requires model training. Here, we introduce a task-agnostic Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging the massive biomedical KG SPOKE with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. KG-RAG consistently enhanced the performance of LLMs across various prompt types, including one-hop and two-hop prompts, drug repurposing queries, biomedical true/false questions, and multiple-choice questions (MCQ). Notably, KG-RAG provides a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain-specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 which exhibited improvement over GPT-4 in context utilization on MCQ data. Our approach was also able to address drug repurposing questions, returning meaningful repurposing suggestions. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM, respectively, in an optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a unified framework.
- North America > United States > California > San Francisco County > San Francisco (0.29)
- North America > United States > Washington > King County > Redmond (0.05)
- Asia > China (0.05)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine
Lian, Jie, Luo, Xufang, Shan, Caihua, Han, Dongqi, Vardhanabhuti, Varut, Li, Dongsheng
Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.
- Asia > China > Hong Kong (0.05)
- North America > United States > Indiana (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (4 more...)
Safurai-Csharp: Harnessing Synthetic Data to improve language-specific Code LLM
Cifarelli, Davide, Boiardi, Leonardo, Puppo, Alessandro, Jovanovic, Leon
This paper introduces Safurai-Csharp, an open-source model designed to specialize in the generation, completion, and debugging of C# code. Safurai-Csharp is built upon the novel CodeLlama 34B model and leverages the EvolInstruct technique, creating a refined and expanded dataset for its fine-tuning process. The results of its performance, a notable score of 56.33% on the Manual MultiPL-E benchmark (Zero-Shot, Pass@1), signal its high capacity to streamline developers' workflows and aid code learning. It shows promise in setting new stakes in the landscape of open-source C# LLMs and hopes to inspire more inclusive and wide-ranging development in the field of language-specific LLMs.
The Workers Behind AI Rarely See Its Rewards. This Indian Startup Wants to Fix That
In the shade of a coconut palm, Chandrika tilts her smartphone screen to avoid the sun's glare. It is early morning in Alahalli village in the southern Indian state of Karnataka, but the heat and humidity are rising fast. As Chandrika scrolls, she clicks on several audio clips in succession, demonstrating the simplicity of the app she recently started using. At each tap, the sound of her voice speaking her mother tongue emerges from the phone. Before she started using this app, 30-year-old Chandrika (who, like many South Indians, uses the first letter of her father's name, K., instead of a last name) had just 184 rupees ($2.25) in her bank account. But in return for around six hours of work spread over several days in late April, she received 2,570 rupees ($31.30). That's roughly the same amount she makes in a month of working as a teacher at a distant school, after the cost of the three buses it takes her to get there and back. Just by reading text aloud in her native language of Kannada, spoken by around 60 million people mostly in central and southern India, Chandrika has used this app to earn an hourly wage of about $5, nearly 20 times the Indian minimum. And in a few days, more money will arrive--a 50% bonus, awarded once the voice clips are validated as accurate. Chandrika's voice can fetch this sum because of the boom in artificial intelligence (AI). Right now, cutting edge AIs--for example, large language models like ChatGPT--work best in languages like English, where text and audio data is abundant online.
- North America > United States > California (0.14)
- Asia > India > Karnataka > Bengaluru (0.14)
- Asia > Indonesia > Bali (0.05)
- (6 more...)
- Information Technology (0.95)
- Health & Medicine > Therapeutic Area (0.70)
- Government (0.68)
- (2 more...)
Sparks of Artificial General Intelligence: Early experiments with GPT-4 - Microsoft Research
Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)